feat(Doclang): add content layer support#568
Merged
Conversation
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Contributor
|
✅ DCO Check Passed Thanks @vagenas, all your commits are properly signed off. 🎉 |
Contributor
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Require two reviewer for test updatesWonderful, this rule succeeded.When test data is updated, we require two reviewers
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
dolfim-ibm
approved these changes
Mar 30, 2026
cau-git
approved these changes
Mar 30, 2026
|
Documentation Updates 1 document(s) were updated by changes in this PR: Content LayersView Changes@@ -92,6 +92,47 @@
The corresponding save methods (`save_as_text()`, `save_as_markdown()`, `save_as_html()`, `save_as_vtt()`) support the same parameters as their export counterparts, with one difference: `save_as_vtt()` defaults `omit_voice_end` to True (while `export_to_vtt()` defaults it to False) for more concise output files.
+### Doclang (Experimental) Serializer
+
+The experimental Doclang serializer (`docling_core/experimental/doclang.py`) also supports content layer filtering and annotation via `DoclangParams`.
+
+**Content layer filtering:** The `layers` field controls which content layers are serialized. It accepts a `set[ContentLayer]` and defaults to all content layers. To serialize only the body layer, for example:
+
+```python
+from docling_core.experimental.doclang import DoclangParams
+from docling_core.types.doc import ContentLayer
+
+params = DoclangParams(layers={ContentLayer.BODY})
+```
+
+**Layer annotation:** The `layer_mode` field of type `LayerMode` controls whether a `<layer class="..."/>` self-closing XML token is emitted for each item:
+- `LayerMode.MINIMAL` (default): emits `<layer class="..."/>` only when the item's content layer differs from `ContentLayer.BODY`.
+- `LayerMode.ALWAYS`: emits `<layer class="..."/>` for every item, regardless of its layer.
+
+```python
+from docling_core.experimental.doclang import DoclangParams, LayerMode
+from docling_core.types.doc import ContentLayer
+
+params = DoclangParams(
+ layers={ContentLayer.BODY, ContentLayer.FURNITURE},
+ layer_mode=LayerMode.MINIMAL,
+)
+```
+
+In the serialized XML output, content layer information appears as an embedded self-closing token, for example:
+
+```xml
+<page_header>
+ <layer class="furniture"/>
+ Page Header
+</page_header>
+<text>
+ Main body content
+</text>
+```
+
+With `LayerMode.ALWAYS`, `<layer class="body"/>` would also appear inside the `<text>` block above.
+
## Iterating Over Items Including Furniture
For advanced use cases, such as iterating over document items including headers and footers, use the `iterate_items` method with the appropriate content layers: |
ceberam
pushed a commit
to odelliab/docling-core
that referenced
this pull request
Apr 9, 2026
* feat(Doclang): add content layer support Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * rename layer attribute Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.